Optimal Decision Trees for Categorical Data via Integer Programming

نویسندگان

  • Oktay Günlük
  • Jayant Kalagnanam
  • Katya Scheinberg
چکیده

Decision trees have been a very popular class of predictive models for decades due to their interpretability and good performance on categorical features. However, they are not always robust and tend to overfit the data. Additionally, if allowed to grow large, they lose interpretability. In this paper, we present a novel mixed integer programming formulation to construct optimal decision trees of a prespecified size. We take the special structure of categorical features into account and allow combinatorial decisions (based on subsets of values of features) at each node. We show that very good accuracy can be achieved with small trees using moderately-sized training sets. The optimization problems we solve are tractable with modern solvers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal Generalized Decision Trees via Integer Programming

Decision trees have been a very popular class of predictive models for decades due to their interpretability and good performance on categorical features. However, they are not always robust and tend to overfit the data. Additionally, if allowed to grow large, they lose interpretability. In this paper, we present a novel mixed integer programming formulation to construct optimal decision trees ...

متن کامل

ALTERNATIVE MIXED INTEGER PROGRAMMING FOR FINDING EFFICIENT BCC UNIT

Data Envelopment Analysis (DEA) cannot provide adequate discrimination among efficient decision making units (DMUs). To discriminate these efficient DMUs is an interesting research subject. The purpose of this paper is to develop the mix integer linear model which was proposed by Foroughi (Foroughi A.A. A new mixed integer linear model for selecting the best decision making units in data envelo...

متن کامل

A Two Stage Stochastic Programming Model of the Price Decision Problem in the Dual-channel Closed-loop Supply Chain

In this paper, we propose a new model for designing integrated forward/reverse logistics based on pricing policy in direct and indirect sales channel. The proposed model includes producers, disposal center, distributers and final customers. We assumed that the location of final customers is fixed. First, a deterministic mixed integer linear programming model is developed for integrated logistic...

متن کامل

A Mixed Integer Programming Approach to Optimal Feeder Routing for Tree-Based Distribution System: A Case Study

A genetic algorithm is proposed to optimize a tree-structured power distribution network considering optimal cable sizing. For minimizing the total cost of the network, a mixed-integer programming model is presented determining the optimal sizes of cables with minimized location-allocation cost. For designing the distribution lines in a power network, the primary factors must be considered as m...

متن کامل

I. INTRODUCTION ECISION tree induction is a popular method for mining knowledge from data by means of decision tree building

In decision analysis, decision trees are commonly used as a visual support tool for identifying the best strategy that is most likely to reach a desired goal. A decision tree is a hierarchical structure normally represented as a tree-like graph model. The tree consists of decision nodes, splitting paths based on the values of a decision node, and sink nodes representing final decisions. In data...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018